Note 1: Covid19 Data at March 30 2020 [1], expected cases extracted from GPW2020 [2]. Bayesian spatial models from [4-6].
Note 2: Due to the lack of covariates that vary in time, risk models have been fitted to the most recent aggregated data for the disease mapping, breaks have been fixed showing real case counts at March 30.
NAME cases ses hh.Disab minorities resid.Transport
18 Cuyahoga 449 0.5541 0.5024 0.8354 0.6332
20 Defiance 5 0.2427 0.4432 0.3894 0.1996
22 Erie 5 0.2621 0.5699 0.4333 0.2783
63 Paulding NA 0.3420 0.6558 0.2951 0.2197
69 Putnam NA 0.0618 0.1722 0.2677 0.0108
88 Wyandot 1 0.2299 0.6603 0.3967 0.1477
Cases (30-03-2020)
Relative risk maps generated with INLA.
There are no P values in Bayes. Importance or significance of variables can be deduced by examining the overlap of their 2.5% and 97.5% posterior estimates with zero.
Loading required package: MCMCglmm
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'MCMCglmm'
In this case, we will use the inverse of the distance to the 3 main airports of Ohio as a covariate as no other information about the areas is available. Given that coordinates are expressed in longitude and latitude great circle distances are used. Inverse distance to these places can be used to test for increased risk in the areas around the airports. For the methodology, Jung and Zhang shown [8, 9] a link between GLMs and Spatial Scan Statistics by Kulldorf [10]. The idea is to use of a dummy variable which is 1 for the areas in the cluster and 0 for the areas outside the cluster. Jung discussed how to extend model-based approaches for the detection of spatial disease clusters to space and time [8].
First, consider a Poisson model with expected counts \(E_{i}\) and observed cases \(O_{i}\) modeled as:
\(O_{i} \sim P_{0}(E_{i}\theta_{i})\)
\(log(\mu_{i,t})=log(E_{i,t})+\alpha+\beta x_{i}\)
where \(\mu_{i,t}\) is the mean of a county who follows a Poisson distribution equal to \(E_{i}\theta_{i}\). \(x_{i}\) represents a covariate of the outcome of interest and \(\theta_{i}\) is the relative risk and it measures deviation in the incidence of covid19 from the expected number of cases. An estimate of the relative risk that does not require covariates is the standardized incidence ratio (SIR), and is defined as \(O_{i}/E_{i}\) [11].
After fitting this model, Gómez-Rubio et al. [12] propose adding cluster covariates as follows:
\(log(\mu_{i,t})=log(E_{i,t})+\alpha+\beta x_{i}+\gamma_{j}C^{(j)}_{i,t}\)
where \(C^{(j)}_{i,t}\) denotes a dummy variable associated with cluster \(j\), with \(j\) taking values from 1 to the number of clusters. Finally, a zero-inflated distribution (ZIP) is considered to assess \(0\) covid cases in some counties. The package DClusterm was used for the cluster detection [12].
Call:
zeroinfl(formula = cases ~ offset(log(E)) + ID.CLE + ID.CVG + ID.CMH |
ID.CLE + ID.CVG + ID.CMH, data = mystfdf, dist = "poisson",
x = TRUE)
Pearson residuals:
Min 1Q Median 3Q Max
-3.69706 -0.38293 -0.18822 -0.05094 16.30579
Count model coefficients (poisson with log link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.25136 0.06496 -3.869 0.000109 ***
ID.CLE 20.45994 1.33958 15.273 < 2e-16 ***
ID.CVG -4.40413 2.60997 -1.687 0.091521 .
ID.CMH 5.02735 0.95852 5.245 1.56e-07 ***
Zero-inflation model coefficients (binomial with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.48 0.58 4.275 1.91e-05 ***
ID.CLE -277.96 67.16 -4.139 3.49e-05 ***
ID.CVG -99.66 29.09 -3.426 0.000614 ***
ID.CMH -35.16 12.67 -2.775 0.005526 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Number of iterations in BFGS optimization: 392
Log-likelihood: -1133 on 8 Df
x y size minDateCluster
Mahoning41 -80.77631 41.01464 3 2020-03-16 20:00:00
Greene75 -83.88989 39.69147 9 2020-03-20 20:00:00
Lucas109 -83.65850 41.61987 14 2020-03-24 20:00:00
Cuyahoga7 -81.65864 41.42447 6 2020-03-14 20:00:00
Franklin103 -83.00930 39.96954 5 2020-03-23 20:00:00
maxDateCluster statistic pvalue risk cluster
Mahoning41 2020-03-28 20:00:00 25.509364 9.149348e-13 0.7130359 TRUE
Greene75 2020-03-20 20:00:00 15.389906 2.890292e-08 2.7773964 TRUE
Lucas109 2020-03-28 20:00:00 10.381898 5.195593e-06 0.4913647 TRUE
Cuyahoga7 2020-03-21 20:00:00 3.350700 9.633724e-03 0.2049237 TRUE
Franklin103 2020-03-27 20:00:00 3.249359 1.079523e-02 0.1789636 TRUE
alpha_bonferroni
Mahoning41 2.083333e-05
Greene75 2.083333e-05
Lucas109 2.083333e-05
Cuyahoga7 2.083333e-05
Franklin103 2.083333e-05
Most significant spatio-temporal clusters of covid19 detected in Ohio.
At this moment, Mahoning, Cuyahoga and Miami counties hold the higher risk (>2.2) for covid19. However the outbreak is present in all states at this moment.
Minorities shown higher odds of getting covid19 (mean=5.45 [2.16, 13.53]).
For the zero-inflated model, Cleveland and Columbus airports show significant associations to the observed cases in the data.
Mahoning county is the most persistent cluster (Red) lasting 13 days at least. This trend might be due that east border counties are equidistant to Cleveland Airport, as well as, Pittsburgh Airport.
Also, Cleveland and Akron remain as the second persistent cluster from March 15 to March 22 (Purple).
Toledo and Columbus shares similar time-frame clusters in the four week of March (Green and Orange).
[1] Data from The New York Times, based on reports from state and local health agencies. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.
[2] Center for International Earth Science Information Network - CIESIN - Columbia University, United Nations Food and Agriculture Programme - FAO, and Centro Internacional de Agricultura Tropical - CIAT. 2005. Gridded Population of the World, Version 4 (GPWv4.11): Population Count Grid. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4639MPP. Accessed 21 03 2020.
[3] ACS County-to-County Migration Flows 2013-2017. https://www.census.gov/topics/population/migration.html
[4] CDC SVI 2018 Documentation, 1/31/2020. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf
[5] Moraga, Paula. (2019). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapman & Hall/CRC Biostatistics Series.
[6] Spatial and spatio-temporal models with R-INLA. M Blangiardo, M Cameletti, G Baio, H Rue. Spatial and spatio-temporal epidemiology 4, 33-49.
[7] Flanagan, Barry E.; Gregory, Edward W.; Hallisey, Elaine J.; Heitgerd, Janet L.; and Lewis, Brian (2011) “A Social Vulnerability Index for Disaster Management,” Journal of Homeland Security and Emergency Management: Vol. 8: Iss. 1, Article 3. DOI: 10.2202/1547-7355.1792
[8] Jung I (2009). “A Generalized Linear Models Approach to Spatial Scan Statistics for Covariate Adjustment.” Statistics in Medicine, 28(7), 1131–1143. doi:10.1002/sim.3535.
[9] Zhang T, Lin G (2009). “Spatial Scan Statistics in Loglinear Models.” Computational Statistics & Data Analysis, 53(8), 2851–2858. doi:10.1016/j.csda.2008.09.016.
[10] Kulldorff M (1997). “A Spatial Scan Statistic.” Communications in Statistics – Theory and Methods, 26(6), 1481–1496. doi:10.1080/03610929708831995.
[11] Waller LA, Gotway CA (2004). Applied Spatial Statistics for Public Health Data. John Wiley & Sons. doi:10.1002/0471662682.
[12] Gómez-Rubio V, Moraga P, Molitor J (2018). “Fast Bayesian Classification for Disease Mapping and the Detection of Disease Clusters.” In M Cameletti, F Finazzi (eds.), Quantitative Methods in Environmental and Climate Research, pp. 1–27. Springer-Verlag. doi: 10.1007/978-3-030-01584-8_1.
[13] Gómez-Rubio, V., Moraga, P., Molitor, J., & Rowlingson, B. (2019). DClusterm: Model-Based Detection of Disease Clusters. Journal of Statistical Software, 90(14), 1 - 26. doi:http://dx.doi.org/10.18637/jss.v090.i14.